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Most  past  work  on  identifying  unexpected  activities  in  video  has  focused  on  looking  for  specific  patterns  of 
anomalous 

activities.  In  this  paper,  we  consider  the  situation  where  we  have  a  known  set  A  of  activities  (normal  and  abnormal) 
that  we  wish 

to  monitor.  However,  in  addition,  we  wish  to  identify  abnormal  activities  that  have  not  been  previously  considered  or 
encountered, 

i.e.  they  are  not  in  A.  We  formally  define  the  probability  that  a  video  sequence  is  unexplained  (totally  or  partially) 
w.r.t.  A.  We 
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Abstract — Most  past  work  on  identifying  unexpected  activities  in  video  has  focused  on  looking  for  specific  patterns  of  anomalous 
activities.  In  this  paper,  we  consider  the  situation  where  we  have  a  known  set  A  of  activities  (normal  and  abnormal)  that  we  wish 
to  monitor.  However,  in  addition,  we  wish  to  identify  abnormal  activities  that  have  not  been  previously  considered  or  encountered, 
i.e.  they  are  not  in  A.  We  formally  define  the  probability  that  a  video  sequence  is  unexplained  ( totally  or  partially)  w.r.t.  A.  We 
develop  efficient  algorithms  to  identify  the  top -k  Totally  and  Partially  Unexplained  Activities  in  a  video  w.r.t.  A.  Our  algorithms  use 
neat  mathematical  properties  of  the  definitions  for  efficiency.  We  describe  experiments  using  two  real-world  datasets  showing 
that  our  approach  works  well  in  practice  in  terms  of  both  running  time  and  accuracy. 

-  ♦  - 


1  Introduction 

Video  surveillance  is  omnipresent.  For  instance,  airport 
baggage  areas  are  continuously  monitored  for  suspicious 
activities.  In  crime-ridden  neighborhoods,  police  often 
monitor  streets  and  parking  lots  using  video  surveillance. 
In  Israel,  highways  are  monitored  by  a  central  authority  for 
suspicious  activities.  However,  all  these  applications  search 
for  known  activities  -  activities  that  have  been  identified 
in  advance  as  being  either  “normal”  or  “abnormal”.  For 
instance,  in  the  highway  application,  security  officers  may 
look  both  for  normal  behavior  (e.g.  driving  along  the 
highway  in  a  certain  speed  range  unless  traffic  is  slow) 
as  well  as  “suspicious”  behavior  (e.g.  stopping  the  car  near 
a  bridge,  taking  a  package  out  and  leaving  it  on  the  side  of 
the  road  before  driving  away). 

In  this  paper,  we  are  given  a  set  A  of  activity  definitions 
expressed  as  stochastic  automata  with  temporal  constraints 
(extending  [l]).1  A  can  contain  “normal”  activities  or 
“suspicious”  activities  or  both.  We  then  try  to  find  video 
sequences  that  are  not  “sufficiently  explained”  by  any  of  the 
activities  in  A.  For  instance,  in  an  airport,  we  may  know  of 
certain  patterns  that  are  suspicious,  but  we  may  also  know 
that  there  are  many  activity  patterns  that  a  criminal/terrorist 
may  use  that  we  cannot  possibly  predict.  Such  “unknown” 
activities  are  “unexplained”  in  our  framework. 

We  achieve  this  via  a  possible- worlds  based  model  and 
define  the  probability  that  a  sequence  of  video  is  totally  (or 
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partially)  unexplained.  Based  on  this,  users  can  specify  a 
probability  threshold  and  look  for  all  sequences  that  are  to¬ 
tally  (or  partially)  unexplained  with  a  probability  exceeding 
the  threshold.  We  then  show  different  important  properties 
we  can  leverage  to  make  the  search  of  unexplained  activities 
more  efficient.  We  define  algorithms  to  find  top -k  totally 
and  partially  unexplained  activities.  We  develop  a  prototype 
implementation  and  report  on  experiments  using  two  data 
sets  showing  that  the  algorithms  work  well  in  practice,  both 
from  an  efficiency  perspective  and  an  accuracy  perspective. 

The  paper  starts  (Section  2)  with  an  overview  of  related 
work.  Section  3  provides  basic  definitions  of  stochastic 
activities  slightly  extending  [1].  Section  4  defines  the 
probability  that  a  video  sequence  is  totally  (or  partially) 
unexplained.  We  also  define  the  problem  of  finding  the  top- 
k  (totally  or  partially)  unexplained  activities  and  classes. 
Section  5  derives  theorems  that  enable  fast  search  for 
totally  and  partially  unexplained  video  sequences.  Section  6 
presents  algorithms  for  solving  the  problems  introduced  in 
Section  4.  Section  7  describes  our  experiments.  The  paper 
concludes  in  Section  8.  2 

2  Related  Work 

A  Priori  Definitions.  Several  researchers  have  studied 
how  to  search  for  specifically  defined  patterns  of  nor¬ 
mal/abnormal  activities  [2].  [3]  studies  how  HMMs  can 
be  used  to  recognize  complex  activites,  while  [4]  and  [5] 
use  coupled  HMMs.  [6]  uses  Dynamic  Bayesian  Networks 
(DBNs)  to  capture  causal  relationships  between  observa¬ 
tions  and  hidden  states.  [1]  developed  a  stochastic  au¬ 
tomaton  based  language  to  detect  activities  in  video,  while 
[7]  presented  an  HMM-based  algorithm.  In  contrast,  this 
paper  starts  with  a  set  A  of  activity  models  ( corresponding 
to  normal/abnormal  activities )  and  finds  video  sequences 
that  are  not  sufficiently  explained  by  the  models  in  A. 

2.  All  proofs  are  reported  in  a  detachable  appendix  included  for  the 
convenience  of  the  reviewers. 
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Such  unexplained  sequences  reflect  activity  occurrences  for 
which  no  model  exists  a  priori. 

Learning  and  then  detecting  abnormality.  Several  re¬ 
searchers  first  learn  normal  activity  models  and  then  detect 
abnormal/unusual  events.  [8]  suggests  a  semi- supervised 
approach  to  detect  abnormal  events  that  are  rare,  unex¬ 
pected,  and  relevant.  We  do  not  require  “unexplained” 
events  to  either  be  rare  or  relevant.  [9]  uses  HMMs  to  detect 
rare  events,  while  [10]  defines  an  anomaly  as  an  atypical  be¬ 
havior  pattern  that  is  not  represented  by  sufficient  samples 
in  a  training  dataset  and  satisfies  an  abnormal  pattern.  [11] 
defines  abnormality  as  unseen  or  rarely  occurring  events 

—  an  initial  video  is  used  to  learn  normal  behaviors.  [12] 
shows  how  to  detect  users  with  abnormal  activities  from 
sensors  attached  to  human  bodies.  An  abnormal  activity  is 
defined  as  “an  event  that  occurs  rarely  and  has  not  been 
expected  in  advance”.  Abnormal  activities  become  normal 
when  they  start  occur  more  often.  The  same  notion  of 
abnormal  activity  is  considered  in  [13]  and  [14].  [15]  learns 
patterns  of  activities  overtime  in  an  unsupervised  way.  [16] 
deals  with  detecting  individual  anomalies  in  crowd  scenes 

—  an  anomaly  is  defined  as  a  rare  or  infrequent  behavior 
compared  to  all  other  behaviors.  The  normality  /abnormality 
of  an  individual  behavior  is  evaluated  w.r.t.  a  specific 
context.  Then,  usual  activities  are  accepted  as  normal  and 
deviant  activity  patterns  are  flagged  as  abnormal.  All  these 
approaches  first  learn  normal  activity  models  and  then 
detect  abnormal/unusual  events.  These  papers  differ  from 
ours  because  they  consider  rare  events  to  be  abnormal.  In 
contrast,  we  consider  activities  to  be  abnormal  even  if  they 
are  not  rare.  For  example,  if  a  new  way  to  break  into  cars 
has  proliferated  during  the  past  month,  then  we  want  to  flag 
those  activities  as  “unexplained”  even  if  they  are  no  longer 
rare.  In  addition,  if  a  model  exists  for  a  rare  activity,  we 
would  flag  it  as  normal,  while  many  of  these  frameworks 
would  not. 

Similarity-based  abnormality.  [17]  proposes  an  unsuper¬ 
vised  technique  in  which  no  activity  model  is  required  a 
priori  and  no  explicit  models  of  normal  activities  are  built. 
Each  event  in  the  video  is  compared  with  all  other  observed 
events  to  determine  how  many  similar  events  exist.  Unusual 
events  are  events  for  which  there  are  no  similar  events  in 
the  video.  Hence,  this  work  also  considers  unusual  activity 
as  a  rare  event  and  a  large  number  of  observations  is 
required  to  verify  if  an  activity  is  indeed  unusual.  [18] 
uses  a  similar  approach:  a  scene  is  considered  anomalous 
when  the  maximum  similarity  between  the  scene  and  all 
previously  viewed  scenes  is  below  a  threshold.  This  is 
also  similar  to  [19]  where  frequently  occurred  patterns  are 
normal  and  patterns  that  are  dissimilar  from  most  patterns 
are  anomalous.  [20]  learns  trajectory  prototypes  and  detects 
anomalous  behaviors  when  visual  trajectories  deviate  from 
the  self-learned  representations  of  typical  behaviors.  In  [3], 
activities  performed  by  a  group  of  moving  and  interacting 
objects  are  modeled  as  shapes  and  abnormal  activities  are 
then  defined  as  a  change  in  the  shape  activity  model. 
Other  relevant  work.  [21]  develops  an  algorithm  that 
collects  low-level  scene  observations  representing  routine 


activities.  Unusual  events  are  detected  by  monitoring  the 
scene  with  monitors  which  extracts  local  low-level  observa¬ 
tions  from  the  video  stream.  Given  a  new  observation,  the 
monitor  computes  the  likelihood  of  this  observation  with 
respect  to  the  probability  distribution  of  prior  observations. 
If  the  likelihood  falls  below  a  certain  threshold,  then  the 
monitor  outputs  an  alert.  The  local  alerts  issued  by  the 
monitors  are  then  combined.  [22]  automatically  learns 
high  frequency  events  (taking  spatio-temporal  aspects  into 
account)  and  declares  them  normal;  then,  events  deviating 
from  these  rules  are  anomalies. 

3  Basic  Activity  Model 

This  section  extends  the  stochastic  activity  model  of  [1] 
(though  we  make  no  claims  of  novelty  for  this).  We 
assume  the  existence  of  a  finite  set  S  of  action  symbols , 
corresponding  to  atomic  actions  that  can  be  detected  by 
image  understanding  methods. 

Definition  3.1  (Stochastic  activity):  A  stochastic  activity 
is  a  labeled  directed  graph  A  =  ( V,  E,  5,  p)  where 

•  V  is  a  finite  set  of  nodes  labeled  with  action  symbols 
from  S\ 

•  E  C  V  x  V  is  a  set  of  edges; 

•  S  :  E  — >>  N+  associates,  with  each  edge  ( Vi,Vj ),  an 
upper  bound  on  the  time  that  can  elapse  between  Vi 
and  vy, 

•  p  is  a  function  that  associates,  with  each  node  v  G  V 
having  out-degree  1  or  more,  a  probability  distribution 

on  {{v,v')  |  (v,vr)  e  E},  i.e.,  £  p((v,v'))  =  1; 

(v,v')EE 

•  {v  e  V  \  $  v'  e  V  s.t.  (v',v)  G  E}  0,  i.e.,  there 
exists  at  least  one  start  node  in  the  activity  definition; 

•  {v  G  V  |  $  v'  G  V  s.t.  (v,v')  G  E}  ytz  0,  i.e.,  there 
exists  at  least  one  end  node  in  the  activity  definition. 

Figure  1  shows  an  example  of  stochastic  activity  model¬ 
ing  deposits  at  an  Automatic  Teller  Machine  (ATM).  Each 
edge  e  is  labeled  with  (S(e),  p(e)).  For  instance,  the  two 
edges  starting  at  node  insertCard  mean  that  there  is  a 
50%  probability  of  going  to  node  insertChecks  and  a 
50%  probability  of  going  to  node  insertCash  from  node 
insertCard.  In  addition,  it  is  required  that  insertChecks 
and  insertCash  follow  insertCard  within  2  and  1  time 
units,  respectively.  For  the  purpose  of  this  paper,  this 
example  is  simplified  (e.g.,  we  avoided  talking  about  the 
customer  typing  on  the  keypad,  etc.).  In  general,  actions  can 
be  easily  detected  by  either  an  image  processing  algorithm 
(e.g.  detect  Person  would  check  if  a  person  is  present  in 
the  image)  or  a  sensor  (e.g.  to  detect  if  insertCard  holds). 


Fig.  1:  Example  of  stochastic  activity:  ATM  deposit 
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This  framework  extends  [1]  by  adding  the  function  S 
which  expresses  a  constraint  on  the  maximum  “temporal 
distance”  between  two  actions  in  an  activity. 

Definition  3.2  (Stochastic  activity  instance):  An 
instance  of  a  stochastic  activity  (V,  E,  5,  p)  is  a  sequence 
(si, . . . ,  sm)  of  nodes  in  V  such  that 

•  (si,  Si+i)  G  E  for  1  <  i  <  m\ 

•  {s  |  (s,  si)  G  E}  =  0,  i.e.,  si  is  a  start  node;  and 

•  {s  |  (sm,  s)  G  E}  =  0,  i.e.,  sm  is  an  end  node. 

The  probability  of  the  instance  is  P((si>  5i+i))- 

Thus,  an  instance  of  a  stochastic  activity  A  is  a  path 
in  A  from  a  start  node  to  an  end  node.  In  Figure  1, 
(detectPerson,  insertCard,  insertCash,  withdrawCard)  is  an 
instance  with  probability  0.35.  Throughout  this  paper,  we 
assume  an  arbitrary  but  fixed  set  A  of  stochastic  activities. 

A  video  is  a  finite  sequence  of  frames.  Each  frame  / 
has  an  associated  timestamp,  denoted  f.ts ;  without  loss  of 
generality,  we  assume  timestamps  to  be  positive  integers.  A 
labeling  £  of  a  video  v  is  a  mapping  £  :  v  — )>  2s  that  takes 
a  video  frame  /  G  v  as  input,  and  returns  a  set  of  action 
symbols  £(f)  C  S  as  output.  Intuitively,  a  labeling  can 
be  computed  via  an  appropriate  suite  of  image  processing 
algorithms  and  specifies  the  actions  detected  in  each  frame 
of  a  video. 

Example  3.1 :  Consider  a  video  v  =  (/i,  /h,  /3,  /4,  /s), 
with  fi.ts  =  i  for  1  <  i  <  5.  A  possible  labeling  ^ 
of  v  is:  £(fi)  =  {detectPerson},  £(f2)  =  {insertCard}, 
£{fs)  =  {insertCash},  £(f4)  =  {withdrawCash},  £(f5)  = 
{withdrawCard}. 

Suppose  Si  =  (ai, . . . ,  an)  and  S2  =  (61, ... ,  bm)  are 
two  sequences.  S2  is  a  subsequence  of  Si  iff  there  exist 
1  <  <  . . .  <  <  n  s.t.  bi  =  a,j.  for  1  <  i  <  m. 

If  ji  =  3i+ 1  —  1  for  1  <  i  <  m,  then  S2  is  a  contiguous 
subsequence  of  Si.  We  write  SinS2  7^  0  iff  Si  and  S2  have 
a  common  element  and  write  e  G  Si  iff  e  is  an  element 
appearing  in  Si.  The  concatenation  of  Si  and  S2,  i.e.,  the 
sequence  (ai, . . . ,  an,  61, . . . ,  bm),  is  denoted  by  Si  •  S2. 
Finally,  we  use  |Si|  to  denote  the  length  of  Si,  that  is,  the 
number  of  elements  in  Si. 

We  now  define  an  occurrence  of  a  stochastic  activity  in 
a  video. 

Definition  3.3  (Activity  occurrence ):  Let  v  be  a  video,  £ 
a  labeling  of  v,  and  A  =  (V,  E,  5,  p)  a  stochastic  activity. 
An  occurrence  o  of  A  in  v  w.r.t.  £  is  a  sequence  ((/1,  si), 

•  •  • ,  (fm,sm))  such  that 

•  (/1, . . . ,  /m)  is  a  subsequence  of  t?, 

•  (si, . . . ,  sm)  is  an  instance  of  A, 

•  Si  G  £(fi ),  for  1  <  i  <  ra,  and  3 

•  /i+i.ts  -  fi.ts  <  5((si ,  Si+i)),  for  1  <  i  <  m. 

The  probability  of  o,  denoted  p(o),  is  the  probability  of  the 
instance  (si, . . . ,  sm). 

When  concurrently  monitoring  multiple  activities,  shorter 
activity  instances  generally  tend  to  have  higher  probability. 

3.  With  a  slight  abuse  of  notation,  we  use  Si  to  refer  to  both  node  Si 
and  the  action  symbol  labeling  it. 


To  remedy  this,  we  normalize  occurrence  probabilities  by 
introducing  the  relative  probability  p*  (o)  of  an  occurrence 
o  of  activity  A  as  p*(o)  =  where  pmax  is  the  highest 

probability  of  any  instance  of  A. 

Example  3.2:  Consider  the  video  and  the  labeling  of 
Example  3.1.  An  occurrence  of  the  activity  of  Figure  1  is 
0=  ((/1,  detectPerson),  (/2,  insertCard),  (/3,  insertCash), 
(/5,  withdrawCard)),  and  p*(o)  =  0.875. 

We  use  0(v,£)  to  denote  the  set  of  all  activity  occur¬ 
rences  in  v  w.r.t.  £.  Whenever  v  and  £  are  clear  from  the 
context,  we  write  O  instead  of  0(v,£). 

4  Unexplained  Activity  Probability 
Model 

This  section  defines  the  probability  that  a  video  sequence 
is  unexplained  by  A.  We  note  that  the  occurrence  of  an 
activity  in  a  video  can  involve  conflicts.  For  instance, 
consider  the  activity  occurrence  o  in  Example  3.2  and 
suppose  there  is  a  second  activity  occurrence  o'  such  that 
(/1,  detectPerson)  G  o'.  In  this  case,  there  is  an  implicit 
conflict  because  (/1,  detectPerson)  belongs  to  both  occur¬ 
rences,  but  in  fact,  detectPerson  can  only  belong  to  one 
activity  occurrence,  i.e.  though  o  and  o'  may  both  have  a 
non-zero  probability,  the  probability  that  these  two  activity 
occurrences  coexist  is  0.  Formally,  we  say  two  activity 
occurrences  o,  o'  conflict ,  denoted  o  o',  iff  o  D  o'  ^  0. 
We  now  use  this  to  define  possible  worlds. 

Definition  4.1  (Possible  world):  A  possible  world  for  a 
video  v  and  a  labeling  £  is  a  subset  w  of  O  s.t.  $Oi,Oj  G 

re,  Oi  00  Oj. 

Thus,  a  possible  world  is  a  set  of  activity  occurrences 
which  do  not  conflict  with  one  another,  i.e.,  an  action 
symbol  in  a  frame  cannot  belong  to  two  distinct  activity 
occurrences  in  the  same  world.  We  use  W(v,£)  to  denote 
the  set  of  all  possible  worlds  for  a  video  v  and  a  labeling 
£\  whenever  v  and  £  are  clear  from  the  context,  we  simply 
write  W. 

Example  4.1:  Consider  a  video  with  two  conflicting  oc¬ 
currences  01,02.  There  are  3  possible  worlds:  wo  =  0, 
w\  =  {01},  and  w2  =  {02}.  Note  that  {01,02}  is  not 
a  world  as  01  00  o2.  Each  world  represents  a  way  of 
explaining  what  is  observed.  The  first  world  corresponds  to 
the  case  where  nothing  is  explained,  the  second  and  third 
worlds  correspond  to  the  scenarios  where  we  use  one  of 
the  two  possible  occurrences  to  explain  the  observed  action 
symbols. 

Note  that  any  subset  of  O  not  containing  conflicting 
occurrences  is  a  legitimate  possible  world  —  possible 
worlds  are  not  required  to  be  maximal  w.r.t.  C.  In  the 
above  example,  the  empty  set  is  a  possible  world  even 
though  there  are  two  other  possible  worlds  w\  =  {01}  and 
w2  =  {02}  which  are  supersets  of  it.  The  reason  is  that 
o\  and  o2  are  uncertain,  so  the  scenario  where  neither  o\ 
nor  o2  occurs  is  a  legitimate  one.  We  further  illustrate  this 
point  below. 
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Fig.  2:  Conflict-Based  Partitioning  of  a  video 


Example  4.2:  Suppose  we  have  a  video  where  a  single 
occurrence  o  has  p*(o)  =0.6.  In  this  case,  it  is  natural  to 
say  that  there  are  two  possible  worlds  wo  =  0  and  w\  =  {o} 
and  expect  the  probabilities  of  wo  and  w\  to  be  0.4  and  0.6, 
respectively.  By  restricting  ourselves  to  maximal  possible 
worlds  only,  we  would  have  only  one  possible  world,  wi, 
whose  probability  is  1,  which  is  wrong. 

We  use  no  to  denote  the  transitive  closure  of  no.  Clearly, 
no  is  an  equivalence  relation  and  determines  a  partition  of 
0  into  equivalence  classes  0 i, . . . ,  0m. 

Example  4.3:  Suppose  we  have  a  video 
v  =  (/i,...,/i6)  and  a  labeling  £  such  that  five 

occurrences  oi,  02,  03,  04,  05  are  detected  as  depicted  in 
Figure  2,  that  is,  cq  00  02,  02  ^03,  and  04  00  05.  There 
are  two  equivalence  classes  determined  by  i,  namely 
0i  =  {01,02,03}  and  02  =  {04,05}. 

The  equivalence  classes  determined  by  no  lead  to  a 
conflict-based  partitioning  of  a  video. 

Definition  4.2  ( Conflict-Based  Partitioning ):  Let  v  be  a 
video,  £  a  labeling,  and  0\ , . . . ,  0m  the  equivalence  classes 
determined  by  no.  A  Conflict-Based  Partitioning  (CBP)  of 
v  (w.r.t.  £)  is  a  sequence  ((^1,^1), ... ,  (%,C))  such  that: 

•  Vi  •  . . .  •  Vm  =  u; 

•  ^  is  the  restriction  of  ^  to  iq,  i.e.,  it  is  a  labeling  of 
iq  s.t.  V/  G  Vi,£i(f)  =  £(f),  for  1  <  i  <  m;  and 

•  0(vi,£i)  =  Oi ,  for  1  <  i  <  m. 

The  u^’s  are  called  segments. 

Example  4.4:  A  CBP  of  the  video  in  Example  4.3 
is  ((^1,^1),  (v2, £2)),  where  Vi  =  </i,  -  -  - ,  /a),  v2  = 
(fio,  ■  ■  ■ ,  fie),  i\  and  t2  are  the  restrictions  of  i  to  v\  and 
V2,  respectively.  Another  partitioning  of  the  same  video  is 
the  one  where  ?>i  =  </i, . . . ,  /io)  and  v2  =  (/n, . . . ,  /ie)- 

Thus,  activity  occurrences  determine  a  set  of  possible 
worlds  (intuitively,  different  ways  of  explaining  the  video). 
We  wish  to  find  a  probability  distribution  over  all  possible 
worlds  that  (i)  is  consistent  with  the  relative  probabilities 
of  the  occurrences,  and  (ii)  takes  conflicts  into  account.  We 
assume  the  user  specifies  a  function  Weight  :  A  M+ 
which  assigns  a  weight  to  each  activity  and  prioritizes 
the  importance  of  the  activity.4  The  weight  of  an  occur¬ 
rence  o  of  activity  A  is  the  weight  of  A.  We  use  C(o) 
to  denote  the  set  of  occurrences  conflicting  with  o,  i.e., 
C(o)  =  {o'  |  o'  G  O  A  o'  00  o}.  Note  that  o  G  C(o); 
and  C(o)  =  {0}  when  o  does  not  conflict  with  any  other 


4.  For  instance,  highly  threatening  activities  may  be  assigned  a  high 
weight. 


occurrence.  Finally,  we  assume  that  activity  occurrences 
belonging  to  different  segments  are  independent  events. 
Suppose  pi  denotes  the  (unknown)  probability  of  world  Wi. 
As  we  know  the  probability  of  occurrences,  and  as  each 
occurrence  occurs  in  certain  worlds,  we  can  induce  a  set 
of  nonlinear  constraints  that  will  subsequently  be  used  to 
learn  the  values  of  the  pfl  s. 

Definition  4.3:  Let  v  be  a  video,  £  a  labeling,  and 
Oi, ... ,  0m  the  equivalence  classes  determined  by  no.  We 
define  the  non-linear  constraints  NLC(v,£)  as  follows: 


(  pi  >  0,  \/wi  G  W 

L  = 1 

Wi  GW 


E*  /  x  W eight (o) 

........ 


,  Vo  G  0 


Wi  GW  s.t.  oEw. 


Pi = n  l  pi  6  w 

k= 1  wi£W  s.t.  wino^=wjnO]<. 


The  first  two  types  of  constraints  enforce  a  probability 
distribution  over  the  set  of  possible  worlds.  The  third  type 
of  constraint  ensures  that  the  probability  of  occurrence 
o  -  which  is  the  sum  of  the  probabilities  of  the  worlds 
containing  o  -  is  equal  to  its  relative  probability  p*(o) 
weighted  by  ^o.^9w^ht(oj),  the  latter  being  the  weight 

of  o  divided  by  the  sum  of  the  weights  of  the  occurrences 
conflicting  with  o.  Note  that:  (i)  the  value  on  the  right- 
hand  side  of  the  third  type  of  constraint  decreases  as  the 
amount  of  conflict  increases,  (ii)  if  an  occurrence  o  is  not 
conflicting  with  any  other  occurrence,  then  its  probability 

T.Wie\Vs.t.oewiPi  is  e(lual  t0  i-e-  the  probability 

returned  by  the  stochastic  automaton.  The  last  kind  of 
constraint  reflects  the  independence  between  segments.  In 
general  NLC(v,£)  might  admit  multiple  solutions. 

Example  4.5:  Consider  a  single- segment  video  consist¬ 
ing  of  frames  /1, . . . ,  /9  shown  in  Figure  2.  Suppose  the 
three  occurrences  01,  02,  03  have  been  detected  with  relative 
probabilities  0.3,  0.6,  and  0.5,  respectively.  Suppose  the 
weights  of  01,  02,  03  are  1,  2,  3,  respectively.  Five  worlds 
are  possible  in  this  case:  wq  =  0,  w\  =  {01},  W2  =  {02}, 
^3  =  {°3}>  and  W4  =  {01,03}.  Then,  NLC(v,£ )  is:5 

Pi  >  0  0  <  i  <  4 

Po  +  Pl  +  P2  +  P3  +  P4  =  1 
Pi  +P4  =  0.3  •  I 
P2  =  0.6  •  I 
P3  +P4  =  0.5  *  | 

which  has  multiple  solutions.  One  solution  is  po  =  0.4, 
Pi  =  0.1,  P2  =  0.2,  ps  =  0.3,  P4  =  0.  Another  solution  is 
Po  =  0.5,  pi  =  0,  p2  =  0.2,  p3  =  0.2,  p4  =  0.1. 

In  the  rest  of  the  paper,  we  assume  that 
NLC(v,£ )  is  solvable.6  We  say  that  a  sequence 
S  =  ((/1, 81), ... ,  (/n,  sn)}  occurs  in  a  video  v  w.r.t.  a 
labeling  £  iff  (/1, . . . ,  fn)  is  a  contiguous  subsequence  of 


5.  For  brevity,  we  do  not  explicitly  list  the  independence  constraints. 

6.  This  can  be  easily  checked  via  both  a  non-linear  constraint  solver, 
as  well  as  methods  developed  in  the  next  section. 
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v  and  Si  G  £(fi)  for  1  <  i  <  n.  We  give  two  semantics 
for  S  to  be  unexplained  in  a  world  w  G  W: 

1)  S'  is  totally  unexplained  in  w,  denoted  w^pS,  iff 
V(/i,  Sj)  e  5,  Jo  e  w,  (fi,  Si)  e  o; 

2)  S  is  partially  unexplained  in  w ,  denoted  mt^pS,  iff 
3(/»,  Si)  e  S,  $o  e  w,  (fi,  e  o. 

Intuitively,  S  is  totally  (resp.  partially)  unexplained  in  w 
iff  w  does  not  explain  every  (resp.  at  least  one)  symbol  of 
S.  We  now  define  the  probability  that  a  sequence  occurring 
in  a  video  is  totally  or  partially  unexplained. 

Definition  4.4:  Let  v  be  a  video,  £  a  labeling,  and  S  a 
sequence  occurring  in  v  w.r.t.  £.  The  probability  interval 
that  S  is  totally  unexplained  in  v  w.r.t.  £  is  XT(S)  =  [l,u], 
where: 

l  =  minimize  Emew  s.t.  w^TsVi 
subject  to  NLC(v,£) 
u  =  maximize  s.t.  w^TsPi 

subject  to  NLC(v,£) 

The  probability  interval  that  S  is  partially  unexplained  in 
v  w.r.t.  £  is  Xp(S)  =  [l',u'],  where  l',u'  are  derived  in 
exactly  the  same  way  as  l,u  above  by  replacing  the  )4p 
symbols  in  the  above  optimization  problems  by  )4p. 

Thus,  the  probability  that  a  sequence  S  occurring  in  v 
is  totally  (resp.  partially)  unexplained  w.r.t.  to  a  solution  of 
NLC(v,£)  is  the  sum  of  the  probabilities  of  the  worlds 
in  which  S  is  totally  (resp.  partially)  unexplained.  As 
NLC(v,£)  may  have  multiple  solutions,  we  find  the  tight¬ 
est  interval  [l,u\  (resp.  [V  ,u'\)  containing  this  probability 
for  any  solution.  Different  criteria  can  be  used  to  infer  a 
value  from  an  interval  [l,u]9  e.g.  the  MIN  /,  the  MAX  u,  the 
average  (i.e.,  (/  +  u)/ 2),  etc.  Clearly,  the  only  requirement 
is  that  this  value  has  to  be  in  [l,u\.  In  the  rest  of  the  paper 
we  assume  that  one  of  the  above  criteria  has  been  chosen 
—  Vt{S)  (resp.  VP(S))  denotes  the  probability  that  S  is 
totally  (resp.  partially)  unexplained. 

Proposition  4.1:  Consider  two  sequences  Si  and  S2 
occurring  in  a  video.  If  Si  is  a  subsequence  of  S2,  then 
VT(Si)  >  VT{S2)  and  VP(S  1)  <  VP(S2). 

We  now  define  totally  and  partially  unexplained  activity 
occurrences. 

Definition  4.5  ( Unexplained  activity  occurrence ):  Let  v 
be  a  video,  £  a  labeling,  r  G  [0, 1]  a  probability  threshold, 
and  L  G  N+  a  length  threshold.  Then, 

•  a  totally  unexplained  activity  occurrence  is  a  sequence 
S  occurring  in  v  s.t.  (i)  Vt(S)  >  r,  (ii)  \S\  5*  U  ?  and 
(Hi)  5  is  maximal,  i.e.,  there  does  not  exist  a  sequence 
S'  S  occurring  in  v  s.t.  S  is  a  subsequence  of  S', 
Vt(S')  >t,  and  \S'\  >  L. 

•  a  partially  unexplained  activity  occurrence  is  a  se¬ 
quence  S  occurring  in  v  s.t.  (i)  Vp(S)  >  r,  (ii) 
\S\  >  L ,  and  (iii)  S  is  minimal,  i.e.,  there  does  not 
exist  a  sequence  S'  ^  S  occurring  in  v  s.t.  S'  is  a 
subsequence  of  S,  VP(S')  >  r,  and  \S'\  >  L. 

In  the  definition  above,  L  is  the  minimum  length  a 
sequence  must  be  for  it  to  be  considered  a  possible  unex¬ 


plained  activity  occurrence.  Totally  unexplained  activities 
(TUAs  for  short)  S  have  to  be  maximal  because  once  we 
find  S,  any  sub- sequence  of  it  is  (totally)  unexplained  with 
probability  greater  than  equal  to  that  of  S.  On  the  other 
hand,  partially  unexplained  activities  (PUAs  for  short)  S' 
have  to  be  minimal  because  once  we  find  5",  any  super¬ 
sequence  of  it  is  (partially)  unexplained  with  probability 
greater  than  or  equal  to  that  of  S'. 

Intuitively,  an  unexplained  activity  occurrence  is  a  se¬ 
quence  of  action  symbols  that  are  observed  in  the  video 
and  poorly  explained  by  the  known  activity  models.  Such 
sequences  might  correspond  to  unknown  variants  of  known 
activities  or  to  entirely  new  -  and  unknown  -  activities. 

An  Unexplained  Activity  Problem  (UAP)  instance  is  a 
4-tuple  I  a=  (v,£,  r,  L)  where  v  is  a  video,  £  is  a  labeling, 
r  G  [0, 1]  is  a  probability  threshold,  and  L  G  N+  is  a  length 
threshold.  We  want  to  find  the  sets  Atu(I )  and  Apu(I )  of 
all  totally  and  partially  unexplained  activities,  respectively. 
When  I  is  clear  from  context,  we  will  drop  it. 

The  following  definition  introduces  the  top-fc  totally 
and  partially  unexplained  activities.  Intuitively,  these  are 
k  unexplained  activities  having  maximum  probability. 

Definition  4.6  (Top-k  unexplained  activities):  Consider 
an  UAP  instance  and  let  k  G  N+.  A'jfi  C  Atu  (resp. 
AT  C  Apu )  is  a  set  of  top -k  totally  (resp.  partially) 
unexplained  activities  iff  \Aljfi\  =  min{k,\Atu\}  (resp. 
\Apku\  =  min {k, \Apu\ }),  and  VS  G  A{uyS'  G  Atu  -  Af 
(resp.  VS  G  AluyS'  G  Apu  -  Apku )  PT(S)  >  VT{S') 
(resp.  PP(S)  >  Vp(S')). 

Suppose  we  have  an  UAP  instance.  For  any  S,  S'  G  Atu 
(resp.  S,  S'  G  Apu ),  we  write  S  =t  S'  (resp.  S  =p  S')  iff 
Vt{S)  =  Vt(S')  (resp.  VP{S)  =  VP(S')).  Obviously,  =p 
(resp.  =P)  is  an  equivalence  relation  and  determines  a  set 
Ctu  (resp.  Cpu)  of  equivalence  classes.  For  any  equivalence 
class  C  G  Ctu  (resp.  C  G  Cpu)  we  define  Vt{C)  (resp. 
VP(C))  as  the  (unique)  probability  of  the  sequences  in  C. 

The  top -k  totally  and  partially  unexplained  classes  are 
the  k  classes  having  maximum  probability.  Compared  with 
the  top -k  unexplained  activities,  here  we  want  to  return  all 
the  unexplained  activities  having  the  k  highest  probabilities. 

Definition  4.7  (Top-k  unexplained  classes):  Consider  an 
UAP  instance  and  let  k  G  N+.  Cjfi  C  Ctu  (resp.  C^u  C 
Cpu)  is  the  set  of  top -k  totally  (resp.  partially)  unex¬ 
plained  classes  iff  \C]fi\  =  min{&,  \Ctu\}  (resp.  \Cpu\  = 
min{&,  \Cpu\}),  and  VC  G  C£U,VC'  G  Ctu  —  Cjfi  (resp. 
VC  G  cpkuyc'  G  Cpu  -  Cpu)  Vt{C)  >  VT(C')  (resp. 
Vp(C)>Vp(C')). 

5  Properties  of  UAPs 

This  section  derives  properties  of  UAPs  that  can  be  lever¬ 
aged  (in  the  next  section)  to  devise  efficient  algorithms 
to  solve  UAPs.  We  first  show  an  interesting  property 
concerning  the  solution  of  NLC(v,£ )  (some  subsequent 
results  rely  on  it);  then,  in  the  following  two  subsections, 
we  consider  specific  properties  for  totally  and  partially 
unexplained  activities. 
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For  a  given  video  v  and  labeling  £,  we  now  show  that 
if  (vm,£m))  is  a  CBP,  then  we  can  find 

the  solutions  of  the  non-linear  constraints  NLC(v,£)  by 
solving  m  smaller  sets  of  linear  constraints.1  We  define 
LC(v,  £)  as  the  set  of  linear  constraints  of  NLC(v ,  £)  (thus, 
we  include  all  the  constraints  of  Definition  4.3  except  for 
the  last  kind).  Henceforth,  we  use  W  to  denote  >V(u,  /) 
and  Wi  to  denote  1  <  i  <  m.  A  solution  of 

NLC(v,£)  is  a  mapping  V  :  W  [0,1]  which  satisfies 
NLC(v ,  £).  Likewise,  a  solution  of  LC(vi,£f)  is  a  mapping 
Vi  :  Wi  — »  [0, 1]  which  satisfies  LC(yi^£f).  It  is  important 
to  note  that  W  =  {wi  U  . . .  U  |  Wi  G  Wi,  1  <  i  <  m}. 

Theorem  1:  Let  v  be  a  video,  ^  a  labeling,  and 
((vi,£i), . . . ,  (vm,£m))  a  CBP.  P  is  a  solution  of 
NLC(v,£ )  iff  Vi  G  [1 ,  m]  there  exists  a  solution  TV 
of  LC(vi,£i)  s.t.  for  every 

rci  G  Wi, . . . ,  Wm  G  Wm. 

The  following  example  illustrates  the  previous  theorem. 

Example  5.1 :  Consider  the  video  v  and  the  labeling  £ 
of  Example  4.3  (cf.  Figure  2).  As  shown  in  Example  4.4, 
one  possible  CBP  of  v  and  £  is  ((^i,^i),  {v2,£2)},  where 
Vi  =  (fi,  ■  ■  ■ ,  fa),  V2  =  (/10,  •  •  • ,  he),  £\  and  £2  are  the 
restrictions  of  £  to  v\  and  v2,  respectively.  Theorem  1  says 
that  for  each  solution  V  of  NLC(v,£ ),  there  is  a  solution 
Vi  of  LC(v  1,^1)  and  a  solution  TV  of  LC(v 2, £2)  s.t. 
V{wi  U  w2)  =  Vi(wi)  x  V(w2)  for  every  w\  G  Wi,ic2  G 
W2,  and  vice  versa. 

Consider  a  video  v  and  a  labeling  and  let 
{(vi,£i ),...,  (um,  Tm))  be  a  CBP.  Given  a  sequence 
S'  =  ((/i,si),---,(/?,s9))  occurring  in  v,  we  say  that 

Vi+ 1, . . . ,  u^+n  (1  <  i  <  i  +  n  <  m)  are  the  sub-videos 
containing  S  iff  /1  G  Vi  and  fq  G  v^+n.  In  other  words,  5 
spans  the  sub- videos  Vi,Vi+%$ . . . ,  Vi+n\  it  starts  at  a  point 
in  sub-video  Vi  (as  contains  the  first  frame  of  S)  and 
ends  at  some  point  in  sub-video  Vi+n  (as  Vi+n  contains  the 
last  frame  of  S).  Sk  denotes  the  projection  of  S  on  the  Tc-th 
sub- video  Vk  (i  <  k  <  i  +  n),  that  is,  the  subsequence  of 
S  containing  all  the  pairs  (/,  s)  G  S  with  /g% 

Example  5.2:  Suppose  we  have  a  video 
v  =  (/i,...,/2i)  and  a  labeling  ^  such  that 

£(fi)  =  for  1  <  i  <  21.  In  addition,  suppose 

8  occurrences  are  detected  as  shown  in  Figure  3. 
Consider  the  CBP  ((v1,^1),(v2,£2),(v3,£3),(v4,£4)), 
where  Vi  =  {/i,...,/5},  v2  =  {fe,  ■  ■  ■ ,  ho}, 

V3  =  {fu . fie},  v4  =  {/17,  ■■■J21},  and  ^  is 

the  restrictions  of  £  to  Vi,  for  1  <  i  <  4. 

Consider  now  the  sequence  S  = 

((/8,S8),---,(/i4,si4))  occurring  in  v.  Then,  v2 
and  vs  are  the  sub-videos  containing  S.  Moreover, 
S2  denotes  ((/g,  s8), . . . ,  (f10,  sw)),  and  S3  denotes 

((/ll)  sll);  ■  ■  ■  ■  (/l4,  S14)). 

7.  This  therefore  yields  two  benefits:  first  it  allows  us  to  solve  a  smaller 
set  of  constraints,  and  second,  it  allows  us  to  solve  linear  constraints  which 
are  usually  easier  to  solve  than  nonlinear  ones. 


1  1  1 

1  1  1 

1  1  1 

; 

: 

1  °2  1  ; 

nvi  i 

1  °s  1  i 

1  1 

|  b  |  b  |  b  [  L  [  L 

L  |  L  |  f 8 1 L  |  bo 

bi|  fl2 1  L]  L]  bs|  L 

f  17  [  L]  f  19  [  bo|  bi| 
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Fig.  3:  Conflict-Based  Partitioning  of  a  video 


5.1  Totally  unexplained  activities 

The  following  theorem  says  that  we  can  compute  Tt{S) 
by  solving  LC  (which  are  linear  constraints)  for  each  sub¬ 
video  containing  S  (instead  of  solving  a  non-linear  set  of 
constraints  for  the  whole  video). 

Theorem  2:  Consider  a  video  v  and  a  labeling  £.  Let 
((vi  ,£i),...,(vm,£m))  be  a  CBP  and  (v*, . . . ,  vi+n)  be 
the  sub- videos  containing  a  sequence  S  occurring  in  v.  For 

i  <  k  <  i  +  n,  let 

lk  =  minimize EWhewk  s.t.  wh*TSkPh 
subject  to  LC(vk,£k ) 
uk  =  maximize  J2wheWk  s.t.  wh*TskPh 

subject  to  LC(vk,£k) 


If  Xr(S)  =  [/,  u\,  then  l  =  nS  h  and  u  =  [IlS  uk- 

The  following  example  illustrates  the  theorem  above. 

Example  5.3:  Consider  the  setting  of  Example  5.2, 
which  is  depicted  in  Figure  3.  By  definition,  Tt{S)  can 
be  computed  by  solving  the  non-linear  program  of  Defini¬ 
tion  4.4  for  the  whole  video  v.  Alternatively,  Theorem  2 
says  that  Tt(S)  can  be  computed  as  It(S)  =  [l 2  x 
Z3,  u2  x  us],  where  l2,  u2,  Is,  us  are  computed  as  defined  in 
Theorem  2,  that  is,  by  solving  two  smaller  linear  programs 
for  v2  and  U3. 


The  following  theorem  provides  a  sufficient  condition  for 
a  pair  (/,  s)  not  to  be  included  in  any  sequence  S  occurring 
in  v  and  having  Vt{S)  >  r. 


Theorem  3:  Let  (v,£,r,L)  be  a  UAP  instance.  Given 
(/,  s)  s.t.  /  e  V  and  s  e  £(f),  let  e  P  ^  P*{o)- 


W eight  (o) 


oEO  s.t.  (/,s)Go 

^  -.  If  e  >  1  —  r,  then  there  does  not 

L0iGC(o)  W eight (oj) 

exist  a  sequence  S  occurring  in  v  s.t.  (/,  s)  G  S  and 
VT(S)  >  T. 


If  the  above  condition  holds  for  a  pair  (/,  s),  then  we 
say  that  (/,  s)  is  sufficiently  explained.  Note  that  to  check 
whether  a  pair  (/,  s)  is  sufficiently  explained,  we  do  not 
need  to  solve  any  set  of  linear  or  non-linear  constraints, 
since  e  is  computed  by  simply  summing  the  ( weighted ) 
probabilities  of  the  occurrences  containing  (/,  s).  Thus, 
this  result  yields  a  further  efficiency.  A  frame  /  is  suffi¬ 
ciently  explained  iff  (/,  s)  is  sufficiently  explained  for  every 
s  G  £(f).  If  (/,  s)  is  sufficiently  explained,  then  it  can 
be  disregarded  for  the  purpose  of  identifying  unexplained 
activity  occurrences,  and,  in  addition,  this  may  allow  us  to 
disregard  entire  parts  of  videos  as  shown  in  the  example 
below. 
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Example  5.4:  Consider  a  UAP  instance  (v,£,  r,  L) 
where  v  =  (/i,...,/9)  and  ^  is  s.t.  £(fc)  =  {si}  for 
1  <  i  <  9,  as  depicted  in  Figure  4. 


(fi*) 

(h>S3,) 

(Use) 

(U^s) 

(fg/Sg) 

Fig.  4:  Sufficiently  explained  frames  in  a  video. 

Suppose  L  =  3  and  (/i,si),  (U,s^)9  (/e^e)  are 

sufficiently  explained.  Even  though  we  could  apply  the 
theorem  to  only  a  few  (fi,  s$)  pairs,  we  can  conclude  that 
no  unexplained  activity  occurrence  can  be  found  before  fj, 
because  L  =  3. 

Given  a  UAP  instance  I  =  (u ,  r,  L)  and  a  subsequence 
u'  of  v,  v'  is  relevant  iff  (i)  v'  is  a  contiguous  subsequence 
of  v  (ii)  \v'\  >  L ,  (iii)  V/  G  u',  /  is  not  sufficiently 
explained,  and  (iv)  v'  is  maximal  (i.e.,  there  does  not  exist 
v"  /  v'  s.t.  v'  is  a  subsequence  of  v"  and  v"  satisfies  (i), 
(ii),  (iii)).  We  use  relevant(I)  to  denote  the  set  of  relevant 
sub-videos. 

Theorem  3  entails  that  relevant  sub-videos  can  be  indi¬ 
vidually  considered  when  looking  for  totally  unexplained 
activities  because  there  is  no  totally  unexplained  activity 
spanning  two  different  relevant  sub-videos. 

5.2  Partially  unexplained  activities 

The  following  theorem  states  that  we  can  compute  Tp(S) 
by  solving  NLC  for  the  sub-video  consisting  of  the  seg¬ 
ments  containing  S  (instead  of  solving  NLC  for  the  whole 
video). 

Theorem  4:  Consider  a  video  v  and  a  labeling  £.  Let 
((vi,h),  ■  ■  • ,  (vm,em))  be  a  CBP  and  (vi, . . .  ,vi+n)  be 
the  sub- videos  containing  a  sequence  S  occurring  in  v.  Let 
u*  =  Vi  • . . .  •  Vi+n  and  £*  be  a  labeling  for  v*  s.t.,  for  every 
/  G  v*9  £*(f)  =  £(f).  Tp{S)  computed  w.r.t.  v  and  £  is 
equal  to  Tp(S)  computed  w.r.t.  v*  and  £*. 

We  now  illustrate  the  use  of  the  preceding  theorem. 

Example  5.5:  Consider  the  setting  of  Example  5.2, 
which  is  depicted  in  Figure  3.  By  definition,  Tp(S)  can 
be  computed  by  solving  the  non-linear  program  of  Defi¬ 
nition  4.4  for  the  whole  video  v.  Alternatively,  Theorem  4 
says  that  Tp(S)  can  be  computed  by  solving  the  non-linear 
program  of  Definition  4.4  for  the  sub- video  u*  =  •  U3. 

6  Top-A:  Algorithms 

We  now  present  algorithms  to  find  top -k  totally  and 
partially  unexplained  activities  and  classes.  For  ease  of 
presentation,  we  assume  \£(f)\  =  1  for  every  frame  /  in 
a  video  (this  makes  the  algorithms  much  more  concise  - 
generalization  to  the  case  of  multiple  action  symbols  per 
frame  is  straightforward8).  Given  a  video  v  =  (/1, . . . ,  /n), 
we  use  v(i,j)  (1  <  i  <  j  <  n)  to  denote  the  sequence 
S  =  ((fi,  s^, . . . ,  (fj,  Sj)),  where  s &  is  the  only  element 
in  i{fk),  i  <  k  <  j. 

8.  Indeed,  it  suffices  to  consider  the  different  sequences  given  by  the 
different  action  symbols. 


6.1  Top-k  TUA  and  TUC 

The  Top-k  TUA  algorithm  computes  a  set  of  top -k  totally 
unexplained  activities  in  a  video.  Note  that: 

•  at  every  time,  lowest  is  defined  as  follows: 


lowest  = 


{ 


-1 

min {VT(S)  I  S  G  TopSol } 


if  \TopSol\  <  k 
if  \T op Sol\  =  k 


•  On  line  30,  ‘Add  5  to  TopSol ”  works  as  follows: 

-  If  \TopSol\  <  k ,  then  S  is  added  to  TopSol ; 

-  otherwise,  a  sequence  S'  in  TopSol  having  min¬ 
imum  Vt{S')  is  replaced  by  S. 

Leveraging  Theorem  3,  Top-k  TUA  considers  only  rel¬ 
evant  sub-videos  of  v  individually  (line  2).  When  it  finds 
a  sequence  v' (start,  end)  of  length  at  least  L  having  a 
probability  of  being  totally  unexplained  greater  than  lowest 
(line  5),  it  makes  the  sequence  maximal  by  adding  frames 
on  the  right  (lines  7-14).  Instead  of  adding  one  frame 
at  a  time,  v' (start,  end)  is  extended  by  L  frames  at  a 
time  until  its  probability  drops  below  r  (lines  9-10);  a 
binary  search  is  then  performed  to  find  the  exact  maximum 
length  of  the  unexplained  activity  (lines  15-25).  Note  that, 
when  making  the  sequence  maximal,  if  at  some  point  the 
algorithm  realizes  that  the  unexplained  activity  will  not 
have  a  probability  greater  than  lowest  (i.e.  the  sequence  is 
not  a  top -k  TUA),  then  the  sequence  is  disregarded  and  the 
above  process  of  making  the  sequence  maximal  is  aborted 
(lines  12-14  and  19-21).  This  kind  of  pruning  allows  the 
algorithm  to  move  forward  in  the  video  avoiding  computing 
the  exact  ending  frame  of  the  TUA  thereby  saving  time. 
Throughout  the  algorithm,  Tt  is  computed  by  applying 
Theorem  2. 


Theorem  5:  Algorithm  Top-k  TUA  returns  a  set  of  top -k 
totally  unexplained  activities  of  the  input  instance. 

Algorithm  Top-k  TUC  modifies  Top-k  TUA  as  follows 
to  compute  the  top -k  totally  unexplained  classes: 

•  At  every  time,  lowest  is  defined  as  follows: 

lowest  -l  ~ 1  if  \TopSol\  <  k 

lowest  —  <y  min{pT(C)  |  c  G  TopSol }  if  \TopSol\  =  k 

•  “Add  S  to  TopSol ”  (line  30)  works  as  follows: 

-  If  there  exists  C  G  TopSol  s.t.  Vt(C)  =  Vt(S ), 
then  S  is  added  to  C\ 

-  else  if  \TopSol\  <  k ,  then  the  class  {S}  is  added 
to  TopSol ; 

-  otherwise  the  class  C  in  TopSol  having  minimum 
Tt(C )  is  replaced  with  {S}. 

•  On  line  5,  Vt(v' (start,  end))  >  lowest  is  replaced 
with  TtW (start,  end))  >  lowest ; 

•  On  line  12,  Vt(v' (start,  end))  <  lowest  is  replaced 
with  VtW (start,  end))  <  lowest ; 

•  On  line  19,  VtW (start,  mid))  <  lowest  is  replaced 
with  TtW (start,  mid))  <  lowest ; 

The  algorithm  obtained  by  applying  the  modifications 
above  is  named  Top-k  TUC. 

Theorem  6:  Algorithm  Top-k  TUC  returns  the  top -k 
totally  unexplained  classes  of  the  input  instance. 


Algorithm  1  Top-k  TUA 


Input:  UAP  instance  I  =  (v,i,  r,  L),  k  >  1 
Output:  Top-fc  totally  unexplained  activities 


1 

2: 

3 

4: 

5 

6: 

7 

8 
9: 
10: 
11 
12 

13 

14 

15 

16 

17 

18 
19 
20: 
21 
22: 
23 
24: 
25: 
26 

27 

28 
29 
30: 

31 

32 

33 

34 

35 


TopSol  -  [ 
for  all  v'  G  relevant(I)  do 
start  =  1;  end  =  L 

repeat 

if  Vt(v' (start,  end))  >  tAVt(v' (start,  end))  >  lowest  then 
end'  =  end 
while  end  <  \v'\  do 

end  =  min  {end  +  L,  |r/|} 
if  Vt(v' (start,  end))  <  r  then 
break 
else 

if  Vt(v' (start,  end))  <  lowest  then 
end  =  end  +  1 

go  to  line  33 

s  =  maxjend  —  L,  end'};  e  =  end 
while  e  /  s  do 

mid  =  f(s  +  e)/2] 

if  Vt(v' (start,  raid))  >  r  then 

if  Vt(v' (start,  raid))  <  lowest  then 
end  =  mid  +  1 
go  to  line  33 
else 

s  =  mid 

else 

e  =  mid  —  1 

if  start  >  1  A  Vt(v'  (start  —  1,  s))  >  r  then 
end  =  s  +  1 
go  to  line  33 
else 

S  =  v' (start,  s);  Add  S  to  TopSol 
start  =  start  +  1;  end  =  s  +  1 

else 

start  =  start  +  1;  end  =  ma x{end,  start  +  L  —  1} 
until  end  >  \v'\ 
return  TopSol 


6.2  Top-k  PUA  and  PUC 

The  Top-k  PUA  algorithm  below  computes  a  set  of  top -k 
partially  unexplained  activities  in  a  video.  Note  that: 

•  at  each  time,  lowest  is  defined  as  follows: 

lnwPcf-{  -1  if  \TopSol\  <k 

lowest  —  min{ vP(S)  |  S  G  TopSol}  if  \TopSol\  =  k 

•  On  line  43,  “Add  S  to  TopSol ”  works  as  follows: 

-  If  \TopSol\  <  k ,  then  S  is  added  to  TopSol ; 

-  otherwise,  a  sequence  in  TopSol  having  mini¬ 
mum  Tp  is  replaced  by  S. 

To  find  an  unexplained  activity,  Algorithm  Top-k  PUA 
starts  with  a  sequence  of  length  at  least  L  and  adds 
frames  to  its  right  until  its  probability  of  being  partially 
unexplained  is  above  the  threshold.  As  in  the  case  of 
Top-k  TUA,  this  is  done  by  adding  L  frames  at  a  time 
(lines  5-8)  and  then  performing  a  binary  search  (lines  9- 
27).  When  performing  the  binary  search,  if  at  some  point 
the  algorithm  realizes  that  the  partially  unexplained  activity 
will  not  have  a  probability  greater  than  lowest ,  then  the 
sequence  is  disregarded  and  the  binary  search  is  aborted 
(lines  17-19  and  lines  24-25).  Otherwise,  the  sequence  is 
shortened  on  the  left  making  it  minimal  (lines  28-38)  by 
performing  a  binary  search  instead  of  proceeding  one  frame 
at  a  time.  Once  again,  if  the  algorithm  realizes  that  the 
partially  unexplained  activity  will  not  have  a  probability 
greater  than  lowest ,  then  the  sequence  is  disregarded  and 
the  shortening  process  is  aborted  (lines  34-36).  This  allows 
the  algorithm  to  avoid  computing  the  exact  starting  frame 


of  the  PUA,  thus  saving  time.  Note  that  Tp  is  computed 
by  applying  Theorem  4. 


Algorithm  2  Top-k  PUA 


Input:  UAP  instance  I  =  (v,  £,  r,  L),  k  >  1 
Output:  Top- A:  partially  unexplained  activities 


1 

2: 

3 

4: 

5 

6: 

7 

8 
9: 
10: 
11 
12 

13 

14 

15 

16 

17 

18 
19 
20: 
21 
22: 

23 

24 
25: 
26 

27 

28 
29 
30: 

31 

32 

33 

34 

35 

36 

37 

38 

39 
40: 

41 

42 

43 

44 

45 


TopSol  =  0;  start  —  1;  end  =  L 
while  end  <  |u|  do 

if  Vp(v(start,  end))  <  r  then 
end'  =  end 
while  end  <  |v|  do 

end  =  min  {end  +  L,  |v|} 
if  Vp(v(start,  end))  >  r  then 
break 

if  Vp(v(start,  end))  >  r  then 

if  Vp(v(start,  end))  >  lowest  then 

s  =  ma x{end/  +  1,  end  —  L  +  1};  e  =  end 
while  e  /  s  do 

mid  =  | (s  +  e)/2j 
if  Vp(v (start,  mid))  <  r  then 
s  =  mid  +  1 
else 

if  Vp(v(start,  mid))  <  lowest  then 
start  =  start  +  1;  end  =  mid  +  1 

go  to  line  2 
else 

e  =  mid 

end  =  e 

else 

start  =  start  +  1;  end  =  end  +  1 

go  to  line  2 

else 

return  TopSol 

s'  =  start;  e'  =  end  —  L  +  1 
while  e'  ^  s'  do 

mid  =  [(s'  +  e')/2] 
if  Vp(v(mid,  end))  <  r  then 
e  =  mid  —  1 
else 

if  Vp(v(mid,  end))  <  lowest  then 
start  =  mid  +  1;  end  =  end  +  1 

go  to  line  2 
else 

s'  =  mid 

if  Vp(v(s' ,  end  —  X))  >  r  A  |f(V,  end  —  1)|  >  L  then 
start  =  s'  +  1;  end  =  end  +  1 

go  to  line  2 
else 

S  =  v(s' ,  end);  Add  S  to  TopSol 
start  =  s'  +  1;  end  =  end  +  1 
return  TopSol 


Theorem  7:  Algorithm  Top-k  PUA  returns  a  set  of  top -k 
partially  unexplained  activities  of  the  input  instance. 


Algorithm  Top-k  PUC  modifies  Top-k  PUA  as  follows 
to  compute  the  top -k  partially  unexplained  classes: 

•  At  every  time,  lowest  is  defined  as  follows: 


lowest  = 


={ 


-l 

min{Vp(C)  \  C  G  TopSol} 


if  \TopSol\  <  k 
if  \TopSol\  =  k 


•  “Add  S  to  TopSol ”  (line  43)  works  as  follows: 

-  If  there  exists  C  E  TopSol  s.t.  Vp(C)  =  Vp(S ), 
then  S  is  added  to  C\ 

-  else  if  \TopSol\  <  k ,  then  the  class  {>S}  is  added 
to  TopSol ; 

-  otherwise  the  class  C  in  TopSol  having  minimum 
Vp(C)  is  replaced  with  {S}. 

•  On  line  10,  Vp(v (start,  end))  >  lowest  is  replaced 
with  Vp(v (start,  end))  >  lowest ; 

•  On  line  17,  Vp(v (start,  mid))  <  lowest  is  replaced 
with  Tp(v (start,  mid))  <  lowest ; 

•  On  line  34,  Vp(v(mid,end))  <  lowest  is  replaced 
with  Tp(v(mid,end))  <  lowest ; 
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The  algorithm  obtained  by  applying  the  modifications 
above  is  named  Top-k  PUC. 

Theorem  8:  Algorithm  Top-k  PUC  returns  the  top -k 
partially  unexplained  classes  of  the  input  instance. 

7  Experimental  Evaluation 

Our  prototype  implementation  of  the  proposed  framework 
consists  of  (i)  an  image  processing  library ,  which  performs 
low-level  processing  of  video  frames,  including  object 
tracking  and  classification;  (ii)  a  video  labeler ,  which  maps 
frames  to  action  symbols  based  on  the  output  of  the  image 
processing  stage  (i.e.  gives  a  labeling  of  the  video),  (iii)  an 
activity  recognition  algorithm  based  on  [1]  which  identifies 
all  possible  occurrences  of  known  activities  (specified  by 
a  set  A)  in  the  input  video,  (iv)  a  UAP  engine ,  which 
implements  Algorithms  Top-k  TUA,  Top-k  PUA,  Top-k 
TUC  and  Top-k  PUC  in  10,000  lines  of  Java  code. 

We  experimentally  evaluated  our  framework  in  terms 
of  both  running  time  and  accuracy  on  two  datasets:  (i) 
a  video  we  shot  by  monitoring  a  university  parking  lot, 
and  (ii)  a  benchmark  dataset  about  video  surveillance  in  an 
airport  [23]. 

7.1  Parking  lot  surveillance  video 

The  set  A  defined  in  this  case  includes  activities  such  as 
parking  a  car,  people  passing,  and  other  “known”  activities 
we  expect  to  occur  in  a  parking  lot. 

We  compared  Algorithms  Top-k  TUA  and  Top-k  PUA 
against  “naive”  algorithms  which  are  the  same  as  Top-k 
TUA  and  Top-k  PUA  but  do  not  exploit  the  optimizations 
provided  by  the  theorems  in  Section  5. 

Figures  5  and  6  show  that  Top-k  TUA  and  Top-k  PUA 
significantly  outperform  the  naive  algorithms  which  are  not 
able  to  scale  beyond  videos  of  length  15  and  10  minutes 
for  totally  and  partially  unexplained  activities,  respectively 
(with  longer  videos,  the  naive  algorithms  did  not  terminate 
in  3  hours).  Figures  7a  and  8a  zoom  in  on  the  running  times 
for  Algorithms  Top-k  TUA  and  Top-k  PUA,  respectively. 
The  runtimes  in  Figure  5  when  k  =  5  and  k  =  All 


Fig.  5:  Algorithm  Top-k  TUA  vs.  Naive. 


are  almost  the  same  (the  two  curves  are  indistinguishable) 
because,  up  to  15  minutes,  there  were  at  most  5  totally 
unexplained  activities  in  the  video.  A  similar  argument 
applies  to  Figure  6. 

We  also  evaluated  how  the  different  parameters  that 
define  an  UAP  instance  affect  the  running  time  by  varying 
the  values  of  each  parameter  while  keeping  the  others  fixed 
to  a  default  value. 

Runtime  of  Top-k  TUA.  Table  1  reports  the  values  we 
considered  for  each  parameter  along  with  the  corresponding 
default  value. 


Parameter 

Values 

Default  value 

k 

1,  2,  5,  All 

All 

T 

0.4,  0.6,  0.8 

0.6 

L 

160,  200,  240,  280 

200 

#  worlds 

7  E+04,  4  E+05,  2  E+07 

2  E+07 

TABLE  1:  Parameter  values  used  in  the  experiments  for 
Algorithm  Top-k  TUA  (parking  lot  dataset). 

For  example,  Table  1  says  that  we  measured  the  run¬ 
ning  times  to  find  the  top-1,  top-2,  top-5,  and  all  totally 
unexplained  activities  (as  the  video  length  increases)  while 
keeping  r  =  0.6,  L  =  200,  # worlds  =  2 E  +  07. 

Varying  k.  Figure  7a  shows  that  lower  values  of  k  give 
lower  running  times.  As  discussed  in  the  preceding  section, 
Algorithm  Top-k  TUA  can  can  infer  that  some  sequences 
are  not  going  to  be  top- A:  TUAs  and  quickly  prune:  this 
is  effective  with  lower  values  of  k  because  the  probability 
threshold  to  enter  the  current  Top -k  TUAs  (i.e.,  lowest 
in  Algorithm  Top-k  TUA)  is  higher,  thus  it  gets  more 
restrictive  to  be  added  to  the  current  Top- A:  TUAs  and  the 
pruning  applied  by  Algorithm  Top-k  TUA  becomes  more 
effective. 

Varying  r.  Figure  7b  shows  that  the  runtime  decreases  as 
the  probability  threshold  grows.  Intuitively,  this  is  because 
higher  probability  thresholds  are  a  stricter  requirement  for 
a  sequence  to  be  totally  unexplained,  so  Algorithm  Top-k 
TUA  can  prune  more. 

Varying  L.  Figure  7c  shows  that  higher  values  of  L  yield 
lower  running  times,  though  there  is  not  a  big  difference 


Fig.  6:  Algorithm  Top-k  PUA  vs.  Naive. 
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■  0-2  ■  2-4  ■  4-6  ■  6-8  ■  8-10  BlO-12 


Video  length  (min) 

(a)  Varying  k  (r  =  0.6,  L  =  200) 


■  0-2  ■  2-4  ■  4-6  ■  6-8  ■  8-10  BlO-12 


Video  length  (min) 


(c)  Varying  L  (r  =  0.6,  k  =  All ) 

Fig.  7:  Running  time  of  Algorithm 


■  0-5  ■  5-10  ■  10-15  ■  15-20 


Video  length  (min) 

(b)  Varying  r  (k  =  All ,  L  =  200) 


■  0-5  ■  5-10  ■  10-15 


Video  length  (min) 


(d)  Varying  the  number  of  worlds  (r  =  0.6,  A;  =  All,  L  =  200) 

i-k  TUA  on  the  parking  lot  dataset. 


between  L  =  200  and  L  =  240. 

Varying  Number  of  Possible  Worlds.  Finally,  Figure  7d 
shows  that  more  possible  worlds  leads  to  higher  running 
times.  However,  note  that  big  differences  in  the  number  of 
possible  worlds  yield  small  differences  in  running  times, 
hence  Algorithm  Top-k  TUA  is  able  to  scale  well  (this  is 
due  to  the  application  of  Theorem  2  to  compute  Vt(S)). 
Runtime  of  Top-k  PUA.  When  analyzing  Top-k  PUA,  we 
used  the  same  parameter  values  in  Table  1  except  for  k 
whose  values  were  1,  5, 10,  All  with  default  value  All. 
Varying  k.  The  runtimes  for  k  =  1,5,10  differ  slightly 
from  each  other  and  are  much  lower  than  when  all  PUAs 
had  to  be  found  (Figure  8a). 

Varying  r.  Figure  8b  shows  that  the  runtimes  do  not  change 
much  for  different  values  of  r. 

Varying  L.  Figure  8c  shows  that  higher  values  of  L  lead 
to  lower  runtimes. 

Varying  Number  of  Possible  Worlds.  Figure  8d  shows  that 
higher  numbers  of  possible  worlds  lead  to  higher  runtimes. 
As  with  TUAs,  the  runtime  of  Algorithm  Top-k  PUA 
increases  reasonably  despite  the  steep  growth  of  possible 
worlds.  Moreover,  runtimes  of  Top-k  PUA  are  higher  than 
for  Top-k  TUA  because  computing  Vp(S)  requires  solving 
a  non-linear  program  whereas  Vt{S)  requires  solving 


linear  programs. 

Precision/Recall.  In  order  to  assess  accuracy,  we  compared 
the  output  of  Algorithms  Top-k  TUA  and  Top-k  PUA 
against  ground  truth  provided  by  8  human  annotators  who 
were  taught  the  meaning  of  graphical  representations  of 
activities  in  A  (e.g.  Figure  1).  They  were  asked  to  identify 
the  totally  and  partially  unexplained  sequences  in  the  video 
w.r.t.  A.  We  ran  Top-k  TUA  and  Top-k  PUA  with  values  of 
the  probability  threshold  r  ranging  from  0.4  to  0.8,  looking 
for  all  totally  and  partially  unexplained  activities  in  the 
video  (L  was  set  to  200).  We  use  {Si}ie[i,m]  t0  denote 
the  set  of  unexplained  sequences  returned  by  our  algorithms 
and  to  denote  the  set  of  sequences  flagged  as 

unexplained  by  human  annotators.  Precision  and  recall  were 
computed  as: 

p  =  Kff  S.t.  S_l  «  ff}| 
m 

R  =  Kjjggf  s.t.  sj  «  ff}| 

n 

where  Sf  ~p  Sj  means  that  Sf  and  Sj  overlap  by  a 
percentage  no  smaller  than  75%. 

Figure  9  shows  the  precision/recall  graph. 
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■  0-5  ■  5-10  ■  10-15  ■  15-20  ■  20-25 


Video  length  (min) 

(a)  Varying  k  (r  =  0.6,  L  =  200) 


■  0-5  ■  5-10  ■  10-15  ■  15-20  r  20-25 


Video  length  (min) 


(c)  Varying  L  (r  =  0.6,  k  =  All ) 

Fig.  8:  Running  time  of  Algorithm 


■  0-5  ■  5-10  ■  10-15  ■  15-20  ■  20-25 


(b)  Varying  r  (k  =  All ,  L  =  200) 


■  0-5  ■  5-10  ■  10-15  ■  15-20  ■  20-25 


Video  length  (min) 


(d)  Varying  the  number  of  worlds  (r  =  0.6,  A;  =  All,  L  =  200) 

i-k  PUA  on  the  parking  lot  dataset. 


Fig.  9:  Precision  and  Recall  for  Algorithms  Top-k  TUA  and 
Top-k  PUA  (parking  lot  dataset). 


Precision  and  recall  when  r  =  0. 4,  0.6,  0.8  are  shown  in 
Tables  2  and  3,  and  show  that  the  framework  achieved  a 
good  accuracy. 


T 

Precision 

Recall 

0.4 

62.5 

89.17 

0.6 

66.67 

82.5 

0.8 

72.22 

71.67 

TABLE  2:  Precision  and  recall  of  Algorithm  Top-k  TUA 
on  the  parking  lot  dataset. 


T 

Precision 

Recall 

0.4 

59.65 

77.38 

0.6 

64.91 

74.6 

0.8 

70.18 

71.83 

TABLE  3:  Precision  and  recall  of  Algorithm  Top-k  PUA 
on  the  parking  lot  dataset. 


7.2  Airport  surveillance  video 

We  also  tested  our  algorithms  with  an  airport  video  surveil¬ 
lance  dataset  [23]. 

Runtime  of  Top-k  TUA.  This  data  set  is  far  more  complex 
(in  terms  of  number  of  possible  worlds)  than  the  parking 
lot  data  set  -  the  “naive”  algorithms  did  not  terminate  in  a 
reasonable  amount  of  time,  even  with  a  video  of  5  minutes. 
We  therefore  do  not  show  the  running  times  of  the  naive 
algorithms.  As  in  the  case  of  the  parking  lot  data  set,  we 
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■  0-10  ■  10-20  ■  20-30  ■  30-40 


■  0-50  ■  50-100  ■  100-150  ■  150-200 


(a)  Varying  k  (r  =  0.6,  L  =  600) 


(b)  Varying  r  (k  =  All,  L  =  600) 


(c)  Varying  L  (r  =  0.6,  k  =  All ) 


(d)  Varying  the  number  of  worlds  (r  = 


Fig.  10:  Running  time  of  Algorithm  Top-k  TUA  on  the  airport  dataset. 


0.6,  k  =  All,  L  =  200) 


varied  the  k ,  r,  L ,  ■#-  worlds  parameters,  as  shown  in 
Table  4. 


Parameter 

Values 

Default  value 

k 

1,  2,  5,  All 

All 

T 

0.4,  0.6,  0.8 

0.6 

L 

480,  600,  720,  840 

600 

#  worlds 

2  E+09,  6  E+20,  1  E+28 

1  E+28 

TABLE  4:  Parameter  values  used  in  the  experiments  for 
Algorithm  Top-k  TUA  (airport  dataset). 

Varying  k.  Figure  10a  shows  that  Top-k  TUA’s  runtime 
varies  little  with  k  when  the  video  is  up  to  15  minutes  long. 
After  that,  the  runtime  for  k  =*  1,  2,  5  are  comparable,  but 
the  runtime  for  k  =  All  starts  to  diverge  from  them. 
Varying  r.  Figure  10b  shows  that  the  runtime  when 
r  =  0.4  is  much  higher  than  when  r  =  0.6  and  r  =  0.8 
(the  latter  two  cases  do  not  show  substantial  differences  in 
running  time). 

Varying  L.  Figure  10c  shows  that  higher  values  of  L  yield 
lower  runtimes.  Though  the  difference  is  small  for  videos 
under  15  minutes,  it  becomes  marked  for  20  minute  long 
videos. 

Varying  Number  of  Possible  Worlds.  Figure  lOd  shows 
that  runtimes  for  different  numbers  of  possible  worlds  are 


initially  close  (up  to  15  minutes);  then,  the  runtime  for  1 
E+28  possible  worlds  gets  higher.  There  is  only  a  moderate 
increase  in  runtime  corresponding  to  a  huge  increase  of  the 
number  of  possible  worlds  —  hence,  Top-k  TUA  is  able  to 
scale  well  when  the  video  gets  substantially  more  complex. 
Runtime  of  Top-k  PUA.  We  conducted  experiments  with 
k  =  1,5, 10,  All  -  other  parameters  were  varied  according 
to  Table  4. 

Varying  k.  Figure  11a  shows  that  the  runtime  decreases  as 
k  decreases. 

Varying  r.  Figure  lib  shows  that  the  runtimes  for  r  =  0.4 
and  r  =  0.6  are  similar  and  higher  than  the  runtime  for 
r  =  0.8. 

Varying  L.  Figure  11c  shows  that  lower  values  of  L  give 
higher  running  times.  The  runtimes  are  similar  for  L  =  480 
and  L  =  600  (the  number  of  PUAs  found  in  the  video 
are  similar  in  both  cases).  Execution  times  are  lower  for 
L  =  720  and  much  lower  for  L  =  800  (in  this  case,  the 
number  of  PUAs  found  in  the  video  is  approximately  half 
the  number  of  PUAs  found  with  L  =  480  and  L  =  600). 
Varying  Number  of  Possible  Worlds.  Figure  lid  shows 
that  though  the  runtime  grows  with  the  number  of  possible 
worlds,  Top-k  PUA  responds  well  to  the  steep  growth  of 
the  number  of  possible  worlds. 
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■  0-20  ■  20-40  ■  40-60  ■  60-80  >80-100 


■  0-20  ■  20-40  ■  40-60  ■  60-80  ■  80-100 


Video  length  (min) 

(a)  Varying  k  (r  =  0.6,  L  =  600) 


Video  length  (min) 

(b)  Varying  r  (k  =  All ,  L  =  600) 


■  0-20  ■  20-40  ■  40-60  ■  60-80  ■  80-100 


Video  length  (min) 


■  0-20  ■  20-40  ■  40-60  ■  60-80  □  80-100 


Video  length  (min) 


(c)  Varying  L  (r  =  0.6,  k  =  All ) 


(d)  Varying  the  number  of  worlds  (r  =  0.6,  k  =  All,  L  =  200) 


Fig.  11:  Running  time  of  Algorithm  Top-k  PUA  on  the  airport  dataset. 


Precision/Recall.  We  evaluated  the  accuracy  of  Top-k  TUA 
(resp.  Top-k  PUA)  in  the  same  way  as  for  the  parking  lot 
data  set.  The  precision/recall  graph  is  reported  in  Figure  12 
and  shows  that  we  achieved  high  accuracy  (see  also  Ta¬ 
bles  5  and  6). 


Fig.  12:  Precision  and  Recall  for  Algorithms  Top-k  TUA 
and  Top-k  PUA  (airport  dataset). 


T 

Precision 

Recall 

0.4 

56.48 

80.35 

0.6 

78.79 

76.25 

0.8 

81.82 

73.99 

TABLE  5:  Precision  and  recall  of  Algorithm  Top-k  TUA 
on  the  airport  dataset. 


r 

Precision 

Recall 

0.4 

72.62 

77.12 

0.6 

75 

73.59 

0.8 

76.19 

71.5 

TABLE  6:  Precision  and  recall  of  Algorithm  Top-k  PUA 
on  the  airport  dataset. 

7.3  Experimental  Conclusions 

Our  experiments  show  that: 

(i)  Runtime  increases  with  video  length  (because  there  are 
more  possible  worlds,  causing  LC(v,£)  and  NLC(v,£)  to 
have  more  variables  and  constraints).  Despite  the  enormous 
blow-up  in  the  number  of  possible  worlds,  our  algorithms 
perform  very  well. 

(ii)  Runtime  increases  with  the  number  of  totally  or  partially 
unexplained  activities  present  in  the  video.  This  is  because 
determining  the  exact  endpoints  of  each  TUA  (resp.  PUA) 
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is  costly.  Specifically,  determining  the  exact  end  frame 
of  a  TUA  requires  computing  Vt  many  times:  when  a 
TUA  is  found,  Top-k  TUA  (and  also  Top-k  TUC)  need 
to  go  through  the  while  loop  of  lines  7-14,  the  binary 
search  in  the  while  loop  of  lines  16-25,  and  the  if  block 
of  lines  26-31.  All  these  code  blocks  require  Vt  to  be 
computed.  Likewise,  determining  the  exact  start  and  end 
frames  of  a  PUA  requires  Vp  to  be  computed  many  times  as 
Algorithm  Top-k  PUA  (as  well  as  Algorithm  Top-k  PUC) 
goes  through  different  loops  and  binary  searches  (one  to 
determine  the  start  frame,  another  to  determine  the  end 
frame)  requiring  multiple  computations  of  Vp. 

(iii)  In  general,  the  number  ofTUAs  and  PUAs  in  the  video 
decreases  as  r  and  L  increase ,  because  higher  values  of  r 
and  L  are  stricter  conditions  for  a  sequence  to  be  totally  or 
partially  unexplained. 

(iv)  Runtime  decreases  as  k  decreases  because  our  algo¬ 
rithms  use  k  intelligently  to  infer  that  certain  sequences 
are  not  going  to  be  in  the  result  (aborting  the  loops  and 
binary  searches  mentioned  above). 

(v)  Precision  increases  whereas  recall  decreases  as  r 
increases.  The  experimental  results  have  shown  that  a  good 
compromise  can  be  achieved  by  setting  r  at  least  0.6  and 
that  our  framework  had  a  good  accuracy  with  both  the 
datasets  we  considered. 

8  Conclusions 

Suppose  to  have  a  video  v  and  a  set  A  of  “known”  activities 
(normal  or  suspicious).  In  this  paper,  we  address  the  prob¬ 
lem  of  finding  subsequences  of  v  that  are  not  “sufficiently 
well”  explained  by  the  activities  in  A.  We  formally  define 
what  it  means  for  a  video  sequence  to  be  unexplained  by 
providing  the  notions  of  totally  and  partially  unexplained 
activities.  We  propose  a  possible  worlds  framework  and 
identify  interesting  properties  that  can  be  leveraged  to  make 
the  search  for  unexplained  activities  highly  efficient  via 
intelligent  pruning.  We  leverage  these  properties  to  develop 
the  Top-k  TUA,  Top-k  PUA,  Top-k  TUC,  Top-k  PUC 
algorithms  to  find  totally  and  partially  unexplained  activ¬ 
ities  with  highest  probabilities.  We  conducted  a  detailed 
experimental  evaluation  over  two  datasets  showing  that  our 
approach  works  well  in  practice  in  terms  of  both  running 
time  and  accuracy. 
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